k. Median net migration rate per subcontinent.

Go online and find a data set which contains the 2020 population for the countries of the world together with ISO codes.

pop_data <- read.csv("../data/country_population_data.csv")
pop_data <- pop_data[c("Country.Code", "X2020")]
colnames(pop_data) <- c("ISO", "Population_2020")
full_data <- left_join(data_tibble, pop_data)
## Joining with `by = join_by(ISO)`
head(full_data)
## # A tibble: 6 × 9
##   Net_Migration_Rate Median_Age Youth_Unemployment_R…¹ ISO   Classification_2020
##                <dbl>      <dbl>                  <dbl> <chr> <ord>              
## 1               27.1       23.5                   35.8 SYR   Low                
## 2               15.5       37.2                   NA   VGB   High               
## 3               13.3       39.5                   14.2 LUX   High               
## 4               13         40.5                   13.8 CYM   High               
## 5               11.8       35.6                    9.1 SGP   High               
## 6               10.6       32.9                    5.3 BHR   High               
## # ℹ abbreviated name: ¹​Youth_Unemployment_Rate
## # ℹ 4 more variables: Country <chr>, Region <fct>, Continent <fct>,
## #   Population_2020 <dbl>

For the most part the join worked pretty well, I chose to use the “left_join” so that countries that are not in the original data but are in the population data will be ignored (else this would lead to rows where most of the values are N/A). Only 2 problems occured during the merge: For Kosovo, the original dataset had XKS as the ISO code, while the population set had “XKX” - this was easily corrected by changing the country code in the population data. For Taiwan, no population data was available on the “World Bank”-Site, since it is taken as a part of china \(\rightarrow\) no data is available for just Taiwan alone.

l. Scatterplot of median age and net migration rate in Europe

Make a scatterplot of median age and net migration rate for the countries of Europe. Scale the size of the points according to each country’s population.

scatterplot_data <- full_data %>% filter(Continent=="Europe")
ggplot(scatterplot_data) + aes(x=Median_Age, y=Net_Migration_Rate, size=Population_2020, alpha=0.7, color=Country) + geom_point() + theme(legend.position = "none") + ggtitle("Median age vs. Migration Rate in Europe")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

The median age of most European countries is above 40 years. There also doesn’t seem to be any visible relation between the net migration rate and the median age, most of the datapoints are concentrated in a “blob” in the middle.

m. Interactive plot

On the merged data set from Task k., using function ggplotly from package plotly re-create the scatterplot in Task l., but this time for all countries. Color the points according to their continent.

When hovering over the points the name of the country, the values for median age, net migration rate, and population should be shown. (Hint: use the aesthetic text = Country. In ggplotly use the argument tooltip = c(“text”, “x”, “y”, “size”)).

p <- ggplot(full_data) + aes(x=Median_Age, y=Net_Migration_Rate, size=Population_2020, alpha=0.7, color=Continent, text=Country) + geom_point() + theme(legend.position = "none") + ggtitle("Median age vs. Migration Rate, Worldwide")
pltly <- ggplotly(p, tooltip=c("text", "x", "y", "size"))
pltly

n. Parallel coordinate plot

In parallel coordinate plots each observation or data point is depicted as a line traversing a series of parallel axes, corresponding to a specific variable or dimension. It is often used for identifying clusters in the data.

One can create such a plot using the GGally R package. You should create such a plot where you look at the three main variables in the data set: median age, youth unemployment rate and net migration rate. Color the lines based on the income status. Briefly comment.

library(GGally)
ggparcoord(data=full_data, columns=c(1:3), groupColumn="Classification_2020")

The net migration rate and median rate seem to move up as the income-class rises for each country, while the youth unemployment-rate is more spread evenly between these classes (however most of the high-income countries still seem to have a pretty low youth unemployment rate.)

o. World map visualisation

Using the package rworldmap, create a world map of the median age per country. Use the vignette to find how to do this in R.

library(rworldmap)
cdatamap <- joinCountryData2Map(full_data, joinCode = "ISO3", nameJoinColumn = "ISO")
## 215 codes from your data successfully matched countries in the map
## 3 codes from your data failed to match with a country code in the map
## 29 codes from the map weren't represented in your data
par(mai=c(0,0,0.2,0),xaxs="i",yaxs="i")
mapCountryData(cdatamap, nameColumnToPlot = "Median_Age")